SLOPE for Count Responses
STAT 538A Paper Presentation
Refining the Research Focus
Initial Research Direction: Explore proposed penalization by Zilberman & Abramovich (2025).
Problem: The proposed method relies on Lasso and SLOPE as convex surrogates.
Refined Project Focus: This project centers specifically on SLOPE’s (Sorted L-One Penalized Estimation) performance for count data.
Introduction
SLOPE (Sorted L-One Penalized Estimation) is a method for estimating the parameter \(\beta\) in a parametric statistical model.
- Similar to LASSO, this algorithm incurs a penalty term based on the \(\ell_{1}\) norm of the estimator \(\hat{\beta}\).
- Unlike LASSO, SLOPE does not use a constant term \(\lambda\) to calculate the penalty which is applied to the model fit.
Comparison of \(\ell_{1}\) Penalties
- The penalty term in LASSO regression is \(\lambda\sum_{i=1}^{p}\left\vert\hat{\beta}_{i}\right\vert\).
- The SLOPE penalty is given by \(\sum_{i=1}^{p}\lambda_{i}\left\vert\hat{\beta}_{(i)}\right\vert\).
- In this equation, \(\lambda_{1} \ge \lambda_{2} \ge \dots \ge \lambda_{p} \ge 0\), and the elements of \(\hat{\beta}\) are sorted so that \(\left\vert\hat{\beta}_{(1)}\right\vert \ge \dots \ge \left\vert\hat{\beta}_{(p)}\right\vert\).
Background - Multiple Hypothesis Testing and FWER
- Example - gene testing with \(n\) patients and \(m > n\) predictors
- Original setting - linear regression with known variance
- FWER - family-wise error rate
- Probability of making at least one type I error
- Bonferroni correction
- Set \(\alpha_{\text{BON}} = \frac{\alpha}{m}\)
Background - FDR
- FDR - false discovery rate
- Expected proportion of false rejections
- Benjamini-Hochberg
- Order the p-values
- Find largest \(p\) such that \(p_{(j)} \leq \frac{qj}{m}\)
From Hypothesis Testing to Inference - LASSO
- Model selection can be viewed as multiple hypothesis testing
- If coefficients are 0 they are not significant
- For orthogonal design matrices LASSO is equivalent to Bonferroni
- Orthogonal design matrix: columns are orthogonal
- LASSO: \(\min \frac 1 2 \|y-X\beta\|_2^2 + \lambda \|\beta\|_1\)
From Hypothesis Testing to Inference - SLOPE
- What if we use BH like penalties instead?
- \(\lambda_{\text{BH}}(i) := \phi^{-1}\left(1-\frac{qi}{2m}\right)\)
- \(\min \frac 1 2 \|y-X\beta\|^2 + \sigma \cdot \sum_{i=1}^m \lambda_{\text{BH}}(i)|\beta_i|\)
- Key differences
- Non-homogenous penalty
- Sorting of coefficients
SLOPE on orthogonal design matrices
- Provably controls FDR for orthogonal design matrices with known Gaussian errors
- Convex optimization problem
- Not exactly equivalent to Benjamini-Hochberg
SLOPE in general
- Coefficients don’t have to be related to BH
- Need to obey \(\lambda_1 \geq \lambda_2 \ldots \lambda_p \geq 0\)
- Suggested use
- Use SLOPE for model selection
- Once a model is selected, find coefficients with OLS
Experiments
Lorem ipsum dolor sit amet:
- consectetur adipiscing elit,
- sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
- Ut enim ad minim veniam, quis nostrud exercitation
Experimental Design: Proposal
- Research Gap: Original SLOPE paper (Bogdan et al., 2015) focused on linear models / Gaussian errors[cite: 1].
- Goal: Compare SLOPE against other penalized methods (Lasso, Adaptive Lasso) via simulations specifically for count responses (Poisson model).
- Research Question: How does the performance of SLOPE regarding variable selection accuracy (FDR and Power) compare to Lasso and Adaptive Lasso when applied to high-dimensional Poisson regression?
Compared Methods & Penalties
- Objective: Minimize \(- \frac{1}{n} \log L(\beta; y, X) + \text{Penalty}(\beta)\)
- Penalties:
- SLOPE Penalty: \(\sum_{i=1}^{p}\lambda_{i}|\beta|_{(i)}\), \(|\beta|_{(1)} \ge ... \ge |\beta|_{(p)}\), \(\lambda_1 \ge ... \ge \lambda_p \ge 0\).
- Lasso (L1): \(\lambda ||\beta||_{1}\)
- Adaptive Lasso: \(\lambda \sum_{j=1}^{p} w_j |\beta_j|\) (where \(w_j \propto 1/|\hat{\beta}_{init, j}|^\gamma\))
Experimental Design: Data Generation
- Model: Poisson Regression
- \(Y_i \sim \text{Poisson}(\lambda_i)\)
- \(\log(\lambda_i) = \beta_0 + X_i \beta\)
- Dimensions: \(n = 1000\) observations.
- \(p = 500\) (p < n)
- \(p = 1000\) (p = n)
- \(p = 2000\) (p > n)
- Replications: R = 50 runs per setting.
Experimental Design: Predictors (X)
- Generation: \(X_{ij} \sim N(0, 1)\), columns standardized.
- Correlation Structures:
- Independent: \(\rho = 0\)
- Moderate: \(\rho = 0.5\)
- High: \(\rho = 0.8\)
Experimental Design: True \(\beta\)
- Sparsity \(k = ||\beta||_0\):
- \(k = 10\)
- \(k = 20\)
- \(k = 50\)
- \(k = 100\)
- Signal Strength:
- Simulate “Weak” \(\beta\) scenarios.
- Simulate “Strong” \(\beta\) scenarios.
Experimental Design: Parameter Tuning
- Lasso, Adaptive Lasso:
- 10-fold Cross-Validation.
- Select tuning parameter(s) by minimizing Poisson deviance.
- (Specify initial estimator and \(\gamma\) for Adaptive Lasso)
- SLOPE:
- Target FDR level \(q = 0.1\).
- Use BH-inspired sequence \(\{\lambda_i\}\).
Experimental Design: Evaluation Metrics
- False Discovery Rate.
- Power.